Sequence error (SE) minimization training of neural network for voice conversion
نویسندگان
چکیده
Neural network (NN) based voice conversion, which employs a nonlinear function to map the features from a source to a target speaker, has been shown to outperform GMM-based voice conversion approach [4-7]. However, there are still limitations to be overcome in NN-based voice conversion, e.g. NN is trained on a Frame Error (FE) minimization criterion and the corresponding weights are adjusted to minimize the error squares over the whole source-target, stereo training data set. In this paper, we use the idea of sentence optimization based, minimum generation error (MGE) training in HMM-based TTS synthesis, and modify the FE minimization to Sequence Error (SE) minimization in NN training for voice conversion. The conversion error over a training sentence from a source speaker to a target speaker is minimized via a gradient descent-based, back propagation (BP) procedure. Experimental results show that the speech converted by the NN, which is first trained with frame error minimization and then refined with sequence error minimization, sounds subjectively better than the converted speech by NN trained with frame error minimization only. Scores on both naturalness and similarity to the target speaker are improved.
منابع مشابه
Surface Tension Prediction of Hydrocarbon Mixtures Using Artificial Neural Network
In this study, artificial neural network was used to predict the surface tension of 20 hydrocarbon mixtures. Experimental data was divided into two parts (70% for training and 30% for testing). Optimal configuration of the network was obtained with minimization of prediction error on testing data. The accuracy of our proposed model was compared with four well-known empirical equations. The arti...
متن کاملNeural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten
Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...
متن کاملA Neural Network Method Based on Mittag-Leffler Function for Solving a Class of Fractional Optimal Control Problems
In this paper, a computational intelligence method is used for the solution of fractional optimal control problems (FOCP)'s with equality and inequality constraints. According to the Ponteryagin minimum principle (PMP) for FOCP with fractional derivative in the Riemann- Liouville sense and by constructing a suitable error function, we define an unconstrained minimization problem. In the optimiz...
متن کاملEmotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform
An artificial neural network is one of the most important models for training features of voice conversion (VC) tasks. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as mel cepstral coefficients (MCC) which represent the spectrum features. However, a simple representation for fundamental frequency (F0) is not enough for neural networks to deal with an...
متن کاملHigh-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion
This paper presents a voice conversion (VC) method that utilizes recently proposed recurrent temporal restricted Boltzmann machines (RTRBMs) for each speaker, with the goal of capturing high-order temporal dependencies in an acoustic sequence. Our algorithm starts from the separate training of two RTRBMs for a source and target speaker using speaker-dependent training data. Since each RTRBM att...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014